Skip to content

Fix integration test failing on first Dependabot PR run#692

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 2 commits intomainfrom
yiming.luo/fix-integration-test-first-run
Apr 13, 2026
Merged

Fix integration test failing on first Dependabot PR run#692
gh-worker-dd-mergequeue-cf854d[bot] merged 2 commits intomainfrom
yiming.luo/fix-integration-test-first-run

Conversation

@lym953
Copy link
Copy Markdown
Contributor

@lym953 lym953 commented Apr 10, 2026

The entire PR (including the summary below) was generated by Claude Code.

Problem

The integration-tests CI job fails on the first run of every Dependabot PR with:

Error:
The security token included in the request is invalid.

This happens during sls package for serverless-extension.yml, but not for serverless-forwarder.yml. Dependabot PRs run without valid AWS credentials, which causes sls package to fail when it tries to call AWS.

Root cause

When serverless.yml defines a top-level layers: section, Serverless Framework v3 calls CloudFormation.describeStacks for each layer in compareWithLastLayer() (in lib/plugins/aws/package/compile/layers.js). This is an optimisation that checks whether a layer has already been uploaded to avoid re-uploading unchanged ones. The error handler in that function only silences errors whose message contains "does not exist" — any other AWS error is re-thrown:

(e) => {
    if (e.message.includes('does not exist')) {
        return;
    }
    throw e;  // re-throws "security token invalid" → sls package fails
}

The forwarder test is immune because our plugin wraps all CloudWatch calls in a broad catch that returns [], silently absorbing credential errors. The extension test has no such safety net.

Fix

Add a lightweight test-only Serverless plugin (integration_tests/offline.js) that hooks into before:package:compileLayers and intercepts CloudFormation.describeStacks calls, immediately rejecting them with a synthetic "does not exist" error. This makes compareWithLastLayer treat every snapshot test run as a fresh stack, which is the correct behaviour for snapshot tests where we never actually deploy.

serverless-extension.yml is updated to load this plugin.

How I tested it

Confirmed the root cause by running sls package locally with placeholder invalid credentials against both configs:

# Extension config (before fix) → EXIT: 1
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE AWS_SECRET_ACCESS_KEY=... serverless package --config serverless-extension.yml

# Forwarder config → EXIT: 0 (errors swallowed by plugin)
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE AWS_SECRET_ACCESS_KEY=... serverless package --config serverless-forwarder.yml

Verified the fix by running the same command after adding offline.js:

# Extension config (after fix) → EXIT: 0, "Service packaged"
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE AWS_SECRET_ACCESS_KEY=... serverless package --config serverless-extension.yml

Verified snapshot is unchanged by applying all the normalisation steps from run_integration_tests.sh to the generated template and diffing against correct_extension_snapshot.json — no differences.

Cross-referenced with PR history: PR #666 (a previous Dependabot PR) had the identical failure on its first run and was fixed only by pushing a run ci commit to trigger a fresh CI run, confirming this is a consistent first-run issue rather than a one-off flake.

When custom Lambda layers are defined, Serverless Framework calls
CloudFormation.describeStacks in compareWithLastLayer() to check
whether the layer has already been uploaded. Its error handler only
silences "does not exist" responses; any other error – including the
"security token invalid" response AWS returns when credentials are
absent – is re-thrown, failing sls package.

Add a lightweight test-only Serverless plugin (offline.js) that
intercepts CloudFormation.describeStacks during packaging and returns
a synthetic "does not exist" error. This makes compareWithLastLayer
treat every run as a fresh stack, which is the correct behaviour for
snapshot tests where we never actually deploy. The snapshot output is
unchanged because S3 key timestamps are already normalised.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.69%. Comparing base (a812da7) to head (6be95f7).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #692   +/-   ##
=======================================
  Coverage   77.69%   77.69%           
=======================================
  Files          12       12           
  Lines        1112     1112           
  Branches      350      350           
=======================================
  Hits          864      864           
  Misses        118      118           
  Partials      130      130           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lym953 lym953 marked this pull request as ready for review April 10, 2026 21:45
@lym953 lym953 requested a review from a team as a code owner April 10, 2026 21:45
@lym953 lym953 requested a review from TalUsvyatsky April 10, 2026 21:45
Copy link
Copy Markdown
Contributor

@ava-silver ava-silver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct and well-explained — the root cause analysis is solid and the offline.js shim does what it needs to do. One thing worth doing alongside this (or in a follow-up) is a naming pass, because the mislabeling is what makes this shim look suspicious in the first place.

These tests never deploy anything — they run sls package, generate a CloudFormation template, and diff it against a checked-in snapshot. That's a snapshot test, not an integration test. Calling them "integration tests" implies real AWS connectivity is expected, which is exactly why the credential failure looked like a bug rather than an obvious non-issue.

The naming is scattered across several places that would all need updating together:

  • integration_tests/ directory → snapshot_tests/ (or packaging_tests/)
  • integration_tests/run_integration_tests.shsnapshot_tests/run_snapshot_tests.sh
  • integration_tests/serverless-extension.yml / serverless-forwarder.yml → move accordingly
  • The CI workflow file and its job name — both currently called integration-tests in the Actions runs
  • The integrationTesting / testingMode plugin option in src/env.ts is a separate concept (it bypasses forwarder ARN validation), so that one is fine to leave alone

For context on why there's no cleaner upstream fix: --noDeploy was removed in Serverless Framework v2.36.0 and sls package is the recommended replacement — which is exactly what these tests already use. The compareWithLastLayer() credential issue has been an open bug since serverless/serverless#8187 (2020), and there's an open feature request for a --artifacts-only flag at serverless/serverless#12969 (Dec 2024) with no movement. So the shim is the right call for now — it just deserves a comment pointing at those issues so future maintainers know this isn't accidental and know when it can be removed.

Comment thread integration_tests/offline.js
Co-authored-by: Ava Silver <ava.silver@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants